Search CORE

11 research outputs found

VIDA: a virus database system for the organization of animal virus genome open reading frames

Author: Alba MM
Kellam P
Lee D
Martin N
Orengo CA
Pearl FMG
Shepherd AJ
Publication venue: OXFORD UNIV PRESS
Publication date: 01/01/2001
Field of study

VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships, Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus-database/ VIDA.html

UCL Discovery

PubMed Central

Improving the performance of DomainDiscovery of protein domain boundary assignment using inter-domain linker index

Author: A Andreeva
Abdur R Sikder
Albert Y Zomaya
AR Sikder
FMG Pearl
G Pollastri
G Pollastri
HM Berman
J Cheng
J Liu
J Sim
JE Gewehr
L Kong
M Dumontier
M Suyama
N Nagarajan
OV Galzitskaya
RA George
RL Marsden
S Veretnik
SF Altschul
SJ Wheelan
T Joachims
TA Holland
V Vapnik
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Knowledge of protein domain boundaries is critical for the characterisation and understanding of protein function. The ability to identify domains without the knowledge of the structure – by using sequence information only – is an essential step in many types of protein analyses. In this present study, we demonstrate that the performance of DomainDiscovery is improved significantly by including the inter-domain linker index value for domain identification from sequence-based information. Improved DomainDiscovery uses a Support Vector Machine (SVM) approach and a unique training dataset built on the principle of consensus among experts in defining domains in protein structure. The SVM was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence. RESULTS: Improved DomainDiscovery is compared with other methods by benchmarking against a structurally non-redundant dataset and also CASP5 targets. Improved DomainDiscovery achieves 70% accuracy for domain boundary identification in multi-domains proteins. CONCLUSION: Improved DomainDiscovery compares favourably to the performance of other methods and excels in the identification of domain boundaries for multi-domain proteins as a result of introducing support vector machine with benchmark_2 dataset

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Improved general regression network for protein domain boundary prediction

Author: A Ceroni
A Vieira
Abdur R Sikder
AK Jain
Albert Y Zomaya
AR Sikder
AR Sikder
Bing Bing Zhou
C Chothia
C Civera
CC Lee
CR Robinson
DB Wetlaufer
FMG Pearl
G Pollastri
G Pollastri
HC Van Leeuwen
HM Berman
J Chen
J Cheng
J Liu
J Sim
JCB Melo
JE Gewehr
JS Richardson
JSR Jang
M Dumontier
M Dumontier
M Suyama
MJ Lehtinen
N Nagarajan
OV Galzitskaya
P Baldi
P Bork
Paul D Yoo
RA George
RE Schapire
RL Marsden
RR Copley
RR Joshi
RS Gokhale
S Prompramote
S Veretnik
SF Altschul
TA Holland
Y Freund
Publication venue: BioMed Central
Publication date: 13/02/2008
Field of study

Background: Protein domains present some of the most useful information that can be used to understand protein structure and functions. Recent research on protein domain boundary prediction has been mainly based on widely known machine learning techniques, such as Artificial Neural Networks and Support Vector Machines. In this study, we propose a new machine learning model (IGRN) that can achieve accurate and reliable classification, with significantly reduced computations. The IGRN was trained using a PSSM (Position Specific Scoring Matrix), secondary structure, solvent accessibility information and inter-domain linker index to detect possible domain boundaries for a target sequence. Results: The proposed model achieved average prediction accuracy of 67% on the Benchmark_2 dataset for domain boundary identification in multi-domains proteins and showed superior predictive performance and generalisation ability among the most widely used neural network models. With the CASP7 benchmark dataset, it also demonstrated comparable performance to existing domain boundary predictors such as DOMpro, DomPred, DomSSEA, DomCut and DomainDiscovery with 70.10% prediction accuracy. Conclusion: The performance of proposed model has been compared favourably to the performance of other existing machine learning based methods as well as widely known domain boundary predictors on two benchmark datasets and excels in the identification of domain boundaries in terms of model bias, generalisation and computational requirements. © 2008 Yoo et al; licensee BioMed Central Ltd

Crossref

Michigan Technological University

PubMed Central

Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint

Author: A Andreeva
A Bateman
A Elofsson
A Lupas
A McPherson
A Sali
AE Todd
AE Todd
AJ Enright
B Rost
C Sander
C Vogel
CA Orengo
CH Wu
Christine A Orengo
D Baker
D Busso
D Vitkup
DT Jones
DT Jones
FMG Pearl
GA Reeves
I Letunic
IV Grigoriev
J Liu
J Liu
J Park
J Thornton
J Westbrook
JA Ranea
JC Norvell
JC Wootton
JD Watson
JM Chandonia
JM Chandonia
JM Chandonia
K Karplus
KT Simons
M Linial
M Skovgaard
N Siew
PJ Kersey
R Sanchez
RA Laskowski
RC Stevens
RC Stevens
RI Sadreyev
RL Marsden
Russell L Marsden
SA Lesley
SE Brenner
SE Brenner
SH Kim
SK Burley
SK Burley
SR Eddy
TC Terwilliger
Tony A Lewis
W Minor
W Tian
Y Kim
Y Yan
Publication venue: BioMed Central
Publication date: 01/03/2007
Field of study

BACKGROUND: Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. RESULTS: In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterised families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterised domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. CONCLUSION: This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UCL Discovery

Defining Signatures of Arm-Wise Copy Number Change and Their Associated Drivers in Kidney Cancers.

Author: Benstead-Hume G
Downs JA
Pearl FMG
Wooller SK
Publication venue: 'MDPI AG'
Publication date: 24/08/2022
Field of study

Using pan-cancer data from The Cancer Genome Atlas (TCGA), we investigated how patterns in copy number alterations in cancer cells vary both by tissue type and as a function of genetic alteration. We find that patterns in both chromosomal ploidy and individual arm copy number are dependent on tumour type. We highlight for example, the significant losses in chromosome arm 3p and the gain of ploidy in 5q in kidney clear cell renal cell carcinoma tissue samples. We find that specific gene mutations are associated with genome-wide copy number changes. Using signatures derived from non-negative factorisation, we also find gene mutations that are associated with particular patterns of ploidy change. Finally, utilising a set of machine learning classifiers, we successfully predicted the presence of mutated genes in a sample using arm-wise copy number patterns as features. This demonstrates that mutations in specific genes are correlated and may lead to specific patterns of ploidy loss and gain across chromosome arms. Using these same classifiers, we highlight which arms are most predictive of commonly mutated genes in kidney renal clear cell carcinoma (KIRC)

Institute of Cancer Research Repository

Biologische Datenbanken

Author: A Bateman
AG Murzin
FMG Pearl
J Henikoff
L Falquet
R Apweiler
T Etzold
TK Attwood
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

Aneuploidy tolerance caused by BRG1 loss allows chromosome gains and recovery of fitness.

Author: Benstead-Hume G
Choudhary JS
Downs JA
Pardo M
Pearl FMG
Roumeliotis TI
Schiavoni F
Zuazua-Villar P
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2022
Field of study

Aneuploidy results in decreased cellular fitness in many species and model systems. However, aneuploidy is commonly found in cancer cells and often correlates with aggressive growth, suggesting that the impact of aneuploidy on cellular fitness is context dependent. The BRG1 (SMARCA4) subunit of the SWI/SNF chromatin remodelling complex is frequently lost in cancer. Here, we use a chromosomally stable cell line to test the effect of BRG1 loss on the evolution of aneuploidy. BRG1 deletion leads to an initial loss of fitness in this cell line that improves over time. Notably, we find increased tolerance to aneuploidy immediately upon loss of BRG1, and the fitness recovery over time correlates with chromosome gain. These data show that BRG1 loss creates an environment where karyotype changes can be explored without a fitness penalty. At least in some genetic backgrounds, therefore, BRG1 loss can affect the progression of tumourigenesis through tolerance of aneuploidy

Institute of Cancer Research Repository

Sussex Research Online

Domains mediate protein-protein interactions and nucleate protein assemblies.

Cell physiology is governed by an intricate mesh of physical and functional links among proteins, nucleic acids and other metabolites. The recent information flood coming from large-scale genomic and proteomic approaches allows us to foresee the possibility of compiling an exhaustive list of the molecules present within a cell, enriched with quantitative information on concentration and cellular localization. Moreover, several high-throughput experimental and computational techniques have been devised to map all the protein interactions occurring in a living cell. So far, such maps have been drawn as graphs where nodes represent proteins and edges represent interactions. However, this representation does not take into account the intrinsically modular nature of proteins and thus fails in providing an effective description of the determinants of binding. Since proteins are composed of domains that often confer on proteins their binding capabilities, a more informative description of the interaction network would detail, for each pair of interacting proteins in the network, which domains mediate the binding. Understanding how protein domains combine to mediate protein interactions would allow one to add important features to the protein interaction network, making it possible to discriminate between simultaneously occurring and mutually exclusive interactions. This objective can be achieved by experimentally characterizing domain recognition specificity or by analyzing the frequency of co-occurring domains in proteins that do interact. Such approaches allow gaining insights on the topology of complexes with unknown three-dimensional structure, thus opening the prospect of adopting a more rational strategy in developing drugs designed to selectively target specific protein interactions

Crossref

ART

CASP11 – An Evaluation of a Modular BCL::Fold-Based Protein Structure Prediction Pipeline

Author: A Leaver-Fay
A Ramanathan
A Raval
A Zemla
AA Canutescu
AW Fischer
Axel W. Fischer
BE Weiner
BE Weiner
Bian Li
C Hardin
C Hardin
Carlos F. Lopez
DA Case
Daniel K. Putnam
DK Putnam
DR Roe
DT Jones
E Durham
EF Pettersen
FMG Pearl
H Kim
I Hanukoglu
J Bryngelson
J Ko
J Mendenhall
J Moult
J Yang
J-P Ryckaert
James C. Pino
JD Bryngelson
Jens Meiler
JK Leman
JN Onuchic
K Lindorff-Larsen
KW Plaxco
L Heo
M Feig
M Karakaş
MP Jacobson
N Woetzel
O Carugo
R Bonneau
R Dastvan
R Salomon-Ferrer
RJ Loncharich
S Heinze
S Lindert
S Lindert
S Lindert
S Lindert
S Miyamoto
S Ovchinnikov
ST Rao
Sten Heinze
T Hofmann
W Kabsch
W Zhang
WL Jorgensen
Yan Xia
Yang Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref

Structural Insights into the Evolution of a Sexy Protein: Novel Topology and Restricted Backbone Flexibility in a Hypervariable Pheromone from the Red-Legged Salamander, Plethodon shermani

Author: A Bilwes
A Davies
A Galat
A Garza-Garcia
Andrew N. Lane
BG Fry
C Boschat
C Dulac
C Perez-Iratxeta
CA Palmer
CR Wirsig-Wiechmann
D Jerusalinsky
D Sharma
Damien B. Wilburn
DB Wilburn
DR Melrose
EN Lyukmanova
EV Koonin
F Blasi
F Delaglio
F Laberge
FMG Pearl
G Mourier
H Xu
I Sillitoe
J Albrand
J Greenwald
J Thompson
JM Mudge
K Adermann
K Touhara
Kari A. Doty
Kathleen E. Bowen
Kazushige Touhara
KM Kiemnec-Tyburczy
L Stowers
LD Houck
LD Houck
M Garcia-Boronat
M Piotto
MA Liénard
MA Wouters
MRE Symonds
MV Berjanskii
P Chamero
P Güntert
P Karlson
P-L Wu
Pamela W. Feldhoff
PB McIntosh
R Das
RA Laskowski
Richard C. Feldhoff
RM Kini
RP Joosten
RS McDowell
S Haga
S Roberts
S Yoshinaga
SA Roberts
Sengodagounder Arumugam
SJ Arnold
SM Rollmann
T Herrmann
T Leinders-Zufall
V Tsetlin
W Kabsch
WeilleJR de
WPC Stemmer
WR Gray
Y Isogai
Y Shen
Y-L Lin
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref